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(57) Abstract: A method for non-targeted complex sample analysis which involves the following steps. A first step involves pro- 
viding a database (16) containing identifying data of known molecules. A second step involves introducing a complex sample 
containing multiple unidentified molecules into a Fourier Transform Ion Cyclotron Mass Spectrometer (12) to obtain data regarding 
the molecules in the complex sample. A third step involves comparing the collected data regarding the molecules in the complex 
sample with the identifying data of known molecules in order to arrive at an identification through comparison of the molecules in 
the sample. 
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METHOD OF NON-TARGETED COMPLEX SAMPLE ANALYSIS 

5 FIELD OF THE INVENTION: 

The present invention relates to a method of non-targeted complex sample analysis, with 
particular application to biology, and genomics in particular. 

BACKGROUND OF THE INVENTION 

10 Functional genomics is an emerging field in biotechnology that focuses on the 

characterization of gene function. All organisms contain only one genotype. However, the 
expression of this genotype under varying developmental and environmental conditions 
results in an almost infinite number of possible phenotypes. It is the correlation of gene 
expression to phenotype that defines functional genomics. To properly study a gene we 

15 need to not only know its identity (i.e. sequence) but to be able to observe and characterize 
its expression patterns in response to developmental and environmental changes, in 
isolation as well as in relation to the other genes in the genome. To properly study the 
effects resulting from the expression of a gene we need to be able to characterize the 
phenotype resulting from this activity in an objective and quantifiable manner. This is 

20 what the non-targeted metabolic profiling technology invention described herein enables 
the functional genomics community to do. 

The gene sequences of entire species are now known. Gene-chip technology has 
made it possible to monitor and quantify the changes in expression of each and every gene 
within the genome to developmental and environmental changes, simultaneously. Gene- 

25 chip technology is, in essence, non-targeted gene expression analysis even though it is, in 
actuality, a targeted analysis that just so happens to contain all of the possible targets. This 
is a powerful comprehensive capability, but it was made possible by the fact that the 
genome is a finite and unitary entity. The analogous phenotypic capability would be to 
have every metabolite and protein of an organism known and on a chip. This is not 

30 possible due to the fact that not only are there multiple phenotypes, but a virtually infinite 
number of metabolites and proteins are possible. To be complementary tp the current state 
of genomic analysis, phenotypic analysis must be non-targeted in "actuality". The non- 
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targeted metabolic profiling technology described herein is the only platform that satisfies 
the requirements of non-targeted phenotypic analysis. Furthermore, this technology is not 
restricted to any one species, but is equally effective in all plant and animal species. 

Deciphering the complex molecular makeup of an individual phenotype is a 
5 formidable task. To be able to accurately and reproducibly generate this phenotypic 
information in such a way that the virtually infinite number of possible phenotypes can be 
compared to one another and correlated to gene expression is the crux of the dilemma that 
faces functional genomics. On the molecular level, the phenotype of a given biological 
system can be divided into the proteome and the metabolome. Since gene expression 

10 results in protein synthesis, the proteome is the first and most direct link to gene 
expression. However, due to the complex interactions of metabolic pathways, it is difficult 
to predict the effects that changes in the expression levels of a given protein will have on 
the overall cellular processes that it may be involved in. The metabolome, on the other 
hand, is the summation of all metabolic (proteomic) activities occurring in an organism at 

15 any given point in time. The metabolome is therefore a direct measure of the overall or 
end effect of gene expression on the cellular processes of any given biological system at 
any given time. For this reason, the metabolome should prove to be the more powerful of 
the two phenotypes in actually understanding the effects of gene function and 
manipulation. The non-targeted metabolic profiling technology described herein is the 

20 only comprehensive metabolic profiling technology available. 

Isolation, identification, and quantitation are the three fundamental requirements of 
all analytical methods. The primary challenge for a non-targeted metabolome analysis is to 
meet these requirements for all of the metabolites in the metabolome, simultaneously. The 
second and perhaps more difficult challenge is to be able to meet these requirements with 

25 sufficient throughput and long-term stability such that it can be used side by side with 
gene-chip technology. Such technology will drastically reduce the time that is required for 
the function of a particular gene to be elucidated. In addition, databases of such analyses 
enable very large numbers of phenotypes and genotypes to be objectively and 
quantitatively compared. There is no such product or technology available to functional 

30 genomics scientists at this time. The non-targeted metabolic profiling technology 
described herein has been extensively tested in multiple species. In all cases, the 
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technology has verified the metabolic variations known to exist between various genotypes 
and developmental stages of different species. 

Key Technology Concept. The non-targeted metabolic profiling technology described 
herein can separate, quantify and identify all of the components in a complex biological 
sample quickly and simultaneously. This is achieved without any a priori selection of the 
metabolites of interest and is therefore unbiased. These data are exported to a database that 
allows the researcher to directly compare one sample to another (i.e. mutant vs. wild-type, 
flowering vs. stem elongation, drought stress vs. normal growing conditions, etc.) or to 
organize the entire database by metabolite concentration (i.e. which genotype has the 
greatest or least expression of a given metabolite). This technology is equally applicable to 
the study of human disease. To make use of this information, the researcher just types in 
the empirical formula(s) or the accurate mass(es) of the metabolite(s) he or she is interested 
in and the software will organize the data accordingly. 

The ability to conduct an analysis of the composition of substances in biological samples is 
critical to many aspects of health care, environmental monitoring as well as the product 
development process. Typically the amount of a specific substance in a complex mixture 
is determined by various means. For example, in order to measure analytes in a complex 
mixture, the analyte(s) of interest must be separated from all of the other molecules in the 
mixture and then independently measured and identified. 

In order to separate the analytes in a complex mixture from one another, unique chemical 
and/or physical characteristics of each analyte are used by the researcher to resolve the 
analytes from one another. These unique characteristics are also used to identify the 
analytes. In all previously published reports of complex mixture analysis, the 
methodologies require known analytical standards of each potential analyte before the 
presence and/or identity of a component in the unknown sample can be determined. The 
analytical standard(s) and the unknown sample(s) are processed in an identical manner 
through the method and the resulting characteristics of these standards recorded (for 
example: chromatographic retention time). Using this information, a sample containing 
unknown components can be analyzed and if a component in the unknown sample displays 
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the same characteristic as one of the known analytical standard (s), the component is 
postulated to be the same entity as the analytical standard. This is targeted analysis 
technology. Targeted analysis technology is one-way. The researcher can go from known 
standard to methodology characteristics but not from methodology characteristics to 
5 known standard. The researcher can only confirm or refute the presence and/or amount of 
one of the previously analyzed standards. The researcher cannot go from the method 
characteristics of an unknown analyte to its chemical identity. The major drawback of this 
type of analysis is that any molecule that was not identified prior to analysis is not 
measured. As a result, much potentially useful information is lost to the researcher. To be 
10 truly non-targeted, the method must allow the researcher to equally evaluate all of the 
components of the mixture, whether they are known or unknown. This is only possible if 
the defining physical and/or chemical characteristics of the analyte are not related to the 
method of analysis but are inherent in the composition of the analyte itself (i.e. its atomic 
composition and therefore its accurate mass). 

15 

Key benefits of non-targeted metabolic profiling technology 

1 . Multidisciplinary. Virtually only one set of analyses would need to be performed on a 
given sample and the data resulting from this analysis would be available to all scientists 
regardless of the area of research they are focusing on. 

20 2. Comprehensive. The non-targeted approach assesses ALL metabolite changes and will 
thus lead to a faster and more accurate determination of gene function/disfunction, 
3. Unknown Metabolite Discovery. The non-targeted approach has the potential of 
identifying key metabolic regulators that are currently unknown, and which would not be 
monitored in a targeted analysis scenario. 

25 4. High Throughput. The system is can be fully automated and analysis time is short 
allowing 100's of samples to be analyzed per instrument per day. 

5. Quantitative. The system is reproducible and has an effective dynamic range > 104. 
Relative changes in metabolite expression over entire populations can be studied. 

30 Business Impact of Technology . The ability to generate searchable databases of the 
metabolic profiles of a given organism will represent a revolution in how the effects of 
genetic manipulation on a species can be studied. Currently our knowledge of the actual 
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genetic code is much greater that our knowledge of the functions of the genes making up 
this code. After the mapping of the genome, the next greatest challenge will be 
determining the function and purpose of these gene products and how manipulation of 
these genes and their expression can be achieved to serve any number of purposes. The 
5 time, energy, and cost of investigating the effects of genetic manipulation are great. A 
database that can be searched for multiple purposes and which contains direct measures of 
the metabolic profiles of specific genotypes has the potential to dramatically decrease the 
amount of time required to determine the function of particular gene products. Such a 
database will reduce the risk of investing a large amount of time and resources researching 

10 genes which may have effects on protein expression, but due to down-stream feedback 
mechanisms, no net effect on metabolism at the whole cell or organism level. 
In an article published in CURRENT OPINION IN PLANT BIOLOGY in 1999 entitled 
"Metabolic Profiling: a Rosetta Stone for genomics?", Trethewey, Krotzky and Willmitzer 
indicated that exponential developments in computing have opened up the "possibility" of 

15 conducting non-targeted experimental science. While recognizing that it would not be 
possible to work with infinite degrees of freedom, the opinion was advanced that the power 
of post-experimental data processing would make possible this non-targeted approach. 
The non-targeted approach described in that article dealt only with the post acquisition 
analysis of metabolite data; not the non-targeted collection of metabolite data. 

20 

Thus the feasibility of non-targeted analysis of complex mixtures is neither obvious nor 
simple. The three major problems surrounding the non-targeted analysis of complex 
mixtures are: the ability to separate and identify all of the components in the mixture; the 
ability to organize the large amounts of data generated from the analysis into a format that 
25 can be used for research; and the ability to acquire this data in an automated fashion and in 
a reasonable amount of time. 



30 
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SUMMARY OF THE INVENTION 

What is required is a method of non-targeted complex sample analysis. 

5 According to the present invention there is provided a method for non-targeted complex 
sample analysis that involves the following steps. A first step involves providing a 
database containing identifying data of known molecules (this database contains the 
elemental compositions of all molecules previously identified in nature, organized by 
species, metabolic processes, subcellular location, etc.). A second step involves 

10 introducing a complex sample containing multiple unidentified molecules into a Fourier 
Transform Ion Cyclotron Mass Spectrometer to obtain data regarding the molecules in the 
complex sample. A third step involves comparing the collected data regarding the 
molecules in the complex sample with the identifying data of known molecules in order to 
arrive at an identification through comparison of the molecules in the sample. Molecules 

15 that are not represented in the database (i.e. unknowns) are automatically identified by 
determining their empirical formula. Thus, the method allows rapid identification of new 
molecules within the complex mixture related to specific molecules already identified, as 
well as identification of those molecules within the complex mixture that bear no 
relationship to those class or category of molecules already defined. As a result the 

20 analysis of complex mixtures is greatly simplified. 

The invention, as described, uses the high resolving power of Fourier Transform Ion 
Cyclotron Mass Spectrometry (FTMS) to separate all of the components within the 
mixture tfiat have different empirical formulas. This has been shown for petroleum 

25 distillates, but not for aqueous biological samples ionized in a "soft" ionization mode, 
where adduct ions can be problematic. The accurate mass capability of FTMS that enables 
the determination of empirical formula has been widely established. Furthermore FTMS is 
capable of performing high resolution/accurate mass 2D MS/MS which provides structural 
information that can be used to confirm the identities of components that have identical 

30 empirical formulas and allows the organization of metabolites based upon common 
structural components. This capability has been shown by isolated research groups but is 
not available on a commercial instrument. By integrating these capabilities with an 
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automated sample injection system and an automated data integration and database system, 
all of the components within a complex mixture can be analyzed rapidly and 
simultaneously. The data is then exported into a database that can be searched and 
organized by sample, or analyte. It is to be noted that unlike the approach advocated by 
5 Trethewey, Krotzky and Willmitzer, the present method is not dependant upon the 
advances in post experimental data processing. The non-targeted metabolic profiling 
technology described herein generates a dataset that is simple and compact. Computing 
technology capable of organizing and interpreting the described databases is readily 
available. No new advances are required. Furthermore, the technology does not have the 
1 0 finite limits inherent in the approach of Trethewey, Krotzky and Willmitzer. 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features of the invention will become more apparent from the following 
1 5 description in which reference is made to the appended drawings and figures, the drawings 
and figures are for the purpose of illustration only and are not intended to in any way limit 
the scope of the invention to the particular embodiment or embodiments shown, wherein: 

FIGURE 1 is a side elevation view depicting non-targeted analysis of complex samples in 
20 accordance with the teachings of the present invention. 

FIGURE 2 is an illustration of raw data (mass spectrum) collected from the FTMS 
showing how the metabolites in the complex mixture are separated from one another. Mass 
range displayed 100-350 amu. 

25 

FIGURE 3 is an illustration of raw data (mass spectrum) collected from the FTMS 
showing, how the metabolites in the complex mixture are separated from one another. 10 
amu mass range displayed. 

30 FIGURE 4 is an llustration of raw data (mass spectrum) collected from the. FTMS showing 
how the metabolites in the complex mixture are separated from one another. 1 amu mass 
range displayed. 
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FIGURE 5 is an illustration of raw data (mass spectrum) collected from the FTMS 
showing how the metabolites in the complex mixture are separated from one another. 
Mass range displayed 100-350 amu.0.1 amu window. 

5 

FIGURE 6 is an illustration of strawberry pigment pathway (comparison of different 
developmental stages of an organism). 

FIGURE 7 is an illustration of the extracted mass spectra of Phenylalanine from strawberry 
10 extracts from different developmental stages. 

FIGURE 8 is an illustration of the extracted mass spectra of Cinnamate from strawberry 
extracts from different developmental stages. 

15 FIGURE 9 is an illustration of the extracted mass spectra of 4-Coumarate from strawberry 
extracts from different developmental stages. 

FIGURE 10 is an illustration of the extracted mass spectra of Naringenin from strawberry 
extracts from different developmental stages. 

20 

FIGURE 1 1 is an illustration of the extracted mass spectra of Pelargonidin from strawberry 
extracts from different developmental stages. 

FIGURE 12 is an illustration of the extracted mass spectra of Pelargonidin-3-glucoside 
25 from strawberry extracts from different developmental stages. 

FIGURE 13 is an illustration of glucosinolate mutants in Arabidopsis thaliana (comparison 
of genetic mutants to wild-type and identification of unknown metabolites). Relative 
changes in 3-Methylthiobutyl Glucosinolate illustrated. 

30 
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FIGURE 14 is an illustration of glucosinolate mutants in Arabidopsis thaliana (comparison 
of genetic mutants to wild-type and identification of unknown metabolites). Relative 
changes in 3-Methylsulphinylpropyl Glucosinolate illustrated. 

5 FIGURE 15 is an illustration of glucosinolate mutants in Arabidopsis thaliana (comparison 
of genetic mutants to wild-type and identification of unknown metabolites). Relative 
changes in 3-Methylsulphinylheptyl Glucosinolate illustrated. 

FIGURE 16 is an illustration of Tobacco Flower Analysis (Location of metabolite 
1 0 expected to be responsible for red color in tobacco). 

FIGURE 17 is an illustration of Tobacco Flower Analysis (Location of unknown 
metabolite potentially involved in tobacco color). 

1 5 FIGURE 18 is an illustration of Observed Metabolic Changes in Strawberry Development. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

20 The preferred method of non-targeted complex sample analysis embodiment will now be 
described with reference to FIGURE 1 The purpose of this invention is to provide a means 
of analyzing large numbers of complex samples, for example biological extracts, and be 
able to analyze the information in a non-targeted fashion after the analysis is complete to 
determine the differences between samples. 

25 

In the invention complex samples are directly injected into the FTMS 12 though the use of 
an autosampler 14 with or without the additional use of a chromatographic column. The 
components of the mixture are ionized by one of many potential "soft" ionization sources 
(electrospray, APCI, FAB, SIMS, MALDI, etc.) and then transferred into the ion cyclotron 
30 resonance (ICR) cell with or without additional mass-selective pre-separation (quadrupole, 
hexapole, etc.). The ions are then separated and measured in the ICR cell with or without 
simultaneous MS/MS occurring. The data collected (mass spectrum) is integrated (the 
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mass, relative intensity, absolute intensity of each ion is determined) and processed, with 
or without calibration with known molecules of known concentrations. These data, with or 
without isotope elimination and empirical formula calculation, are then transferred to a 
database 16 that organizes and stores the data for future comparisons and functional 
5 analyses. Once stored in the database, individual samples can be compared with one 
another and those molecules that show different concentrations between the selected 
samples can be displayed. The entire database can be searched for specific molecules. 
The samples in the database can be listed from highest to lowest concentration or vice- 
versa. The molecules detected in the analysis can be compared with a database of known 
10 molecules and the molecules automatically identified. For molecules that do not match 
known molecules, the most likely empirical formulas can be displayed. 

This approach provides numerous advantages to the researcher. There is a dramatic 
increase in the amount of information obtained from each sample (>10x compared to the 

15 most comprehensive targeted analysis procedure reported). Information is collected on 
both known and unknown components of a mixture. There is increased efficiency of data 
collection (data collection is approximately lOx faster than reported targeted analysis 
techniques). It provides a basis for unbiased comparison of unknown samples. Effects of 
gene modification on total cell metabolism can be determined instead of effects on only a 

20 small subset of metabolic processes (i.e. the relationship between different metabolic 
processes can be studied). By analyzing all metabolites the actual step within a metabolic 
process that is disrupted can be determined. Gene modifications that have an effect on 
protein expression but no net effect on cell metabolism can be identified. All of these 
analyses are completed simultaneously in one fast analysis, whereas multiple time- 

25 consuming analyses would have to be performed to get identical data at a tremendously 
higher cost. 

Many examples exist for the use of FTMS for the analysis of complex mixtures, but none 
have introduced the concept of non-targeted analysis followed by database formation. The 
30 described method recognizes and utilizes some heretofore unused capabilities in FTMS. 
FTMS has the theoretical resolving power to separate all of the metabolites of different 
empirical formula in a complex biological sample. FTMS has the theoretical accurate 
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mass capabilities to assign empirical formulas to all of the metabolites in the complex 
biological sample. FTMS has the capability to perform 2 dimensional MS/MS on all of the 
metabolites in a complex biological sample. It is not necessary to know a priori what 
metabolites are present in a complex biological sample if the analytes could thus be 
5 separated and then be identified based upon their empirical formula and MS/MS fragment 
data and or by comparing them to a database of known analytes. Complex samples can be 
compared with one another to determine what analytes had different intensities between 
the samples. A database could be organized by analyte or by common MS/MS fragments. 
This approach significantly decreases the time and resources needed to elucidate gene 
10 function as a result of genetic manipulation, environmental changes, or developmental 
changes in an organism. One of the many applications of the described method invention 
include gene function determination in functional genomics research. 

Numerous targeted LC-MS methods as well as other screening methods have been 
15 developed to analyze specific molecules or groups of molecules in complex samples. The 
major reason that this invention is novel and not obvious is because it employs a 
fundamentally different strategy for analytical analysis and is only possible with highly 
specialized instrumentation and methodology. Although the many independent theoretical 
research capabilities of FTMS have been known for at least 10 years, FTMS has only been 
20 used in a targeted way and for specialized research purposes. In the past 10 years no group 
has described the application of FTMS employed within the scope of the present invention. 
The present invention involves the combining of several theoretical FTMS capabilities into 
a comprehensive, non-targeted metabolic profiling procedure that has commercial utility in 
the analysis and interpretation of complex mixtures. 

25 

The method of the present invention comprises the following steps: 

Generation of Known Metabolite Database . The identity (common name and empirical 
30 formula) and relevant biological information (species, metabolic processes involved in, 
cellular and subcellular location, etc) of all known biological metabolites are inputted into 
a commercial database program (i.e. Microsoft EXCEL, Table L). The accurate 
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monoisotopic mass of these metabolites is automatically determined along with their 
[M+H]+ and [M-H]- accurate mass (M+H and M-H refer to the mass of the metabolite 
when a proton (H+) is either added to the metabolite to create a positively charged ion or 
removed from the metabolite to create a negatively charged metabolite). The data 
5 collected from the FTMS analysis of the complex sample can then be compared to this 
database to immediately identify many of the components in the complex sample. 

Preparation of samples for analysis. The metabolites are extracted from their biological 
source using any number of extraction/clean-up procedures that are typically used in 

10 quantitative analytical chemistry. Procedures are normally tailored to the source of the 
sample (i.e. leaf tissue, root tissue, blood, urine, brain, etc). For example, a O.lg plant leaf 
sample may be extracted by placing it, 1.0 ml of 50/50 MeOH/0.1% formic acid, and 3 
small glass beads in a test tube and then vortexing for one minute to homogenize the 
sample. The test tube is then centrifuged for 5 minutes. lOOul of the supernatant is then 

15 transferred from the test tube to a 96 well plate. The 96 well plate is placed upon the 
autosampler. 20ul of the supernatant is injected into the FTMS. 

Typical operating conditions 

Solvents . 50/50 MeOH/0.1% ammonium hydroxide as the mobile phase and for dilution 
20 for all negative ionization analyses and 50/50 MeOH/0.1% formic acid for all positive ion 
analyses. 

Instrumentation . Bruker Daltonics APEX III Fourier Transform Mass Spectrometer 
(FTMS) equipped with a 7.0 Tesla actively shielded super conducting magnet with 
electrospray (ESI) and atmospheric chemical ionization (APCI) sources. ESI, APCI, and 

25 ion transfer conditions were optimized for sensitivity and resolution using a standard mix 
of serine, tetra-alanine, reserpine, HP Mix, and adrenocorticotrophic hormone fragment 4- 
10. Instrument conditions were optimized for ion intensity and broadband accumulation 
over the mass range of 100-1000 amu. One megaword data files were acquired and a sinm 
data transformation was performed prior to Fourier transform and magnitude calculations. 

30 Calibration . All samples were internally calibrated for mass accuracy over the 
approximate mass range of 100-1000 amu using a mixture of the above-mentioned 
standards. 
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Sample Analysis 

Samples are introduced to the FTMS via an auto sampler, or in some cases with a syringe 
pump. When the sample solution reaches the source of the FTMS (the source is where the 
5 FTMS ionizes the molecules in the sample solution), then molecules are ionized according 
to the principles of the particular ionization source used. The source can either be external 
to the mass analyzer or internal, depending on the type of ionization (for example in ESI 
and APCI ions are generated external to the mass analyzer and then transferred to the mass 
analyzer, whereas in electron impact ionization the molecules are ionized internal to the 
10 mass analyzer). The ions once generated and transferred (if necessary) to the mass 
analyzer are then separated and detected in the mass analyzer based upon their mass to 
charge ratio. 



15 Analyte Detection 

All of the analytes within the complex mixture are analyzed simultaneously (see Figures 2- 
5). Structurally specific information (accurate mass with or without accurate MS/MS 
fragment masses) is obtained for all of the analytes without prior knowledge of the 
analyte 5 s identity, and then this data is formatted in a way that is amicable to a 

20 comprehensive database. 

Complex Sample Database Formation 

The typical process of database formation involves the following steps: 

1. The output of the FTMS (calibrated mass spectrum) is filtered to remove all 
25 13C isotopes and peaks that have mass defects that do not correspond to singly 

charged biological metabolites; 

2. Each of the peaks in this filtered peak list is then analyzed using the mass 
analysis program that is part of the instrument manufacturer's software package 
according to the elemental constraints provided by the researcher. This 

30 program returns all of the possible elemental compositions that are possible at a 

given mass within a certain selected error range. 
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3. Only the data (file name, sample ID, mass, relative intensity, absolute intensity, 
empirical formula(s)) from those peaks in the filtered peak list that satisfied the 
above constraints are exported to a final processed data file (Table II). Each 
sample analysis results in such a final processed data file. 
5 4. Multiple databases can then be formed from the combining and comparing of 

the data files. Three such databases are: 

a) Direct comparison of two samples to create a database of differences 
(Table VI); 

b) Combination of multiple files to create a database capable of tracking 
10 changes through a series of samples (Table III); 

c) Direct comparison of a whole series of samples to one control sample 
and then the combination of all the samples in the series into one 
database to allow comparisons within the series vs a common control 
(Figure 8). 

15 

The utility of the invention is illustrated in the following examples: 

I The ability to compare different developmental stages of an organism (Figures 6-12, 
Table IV) . 

20 In this example, we looked at the strawberry pigment pathway in strawberries. Figure 6 
shows the full metabolic pathway. Figures 7-12 show the various metabolites in the 
pathway that we observed. It is to be noted that we were able to look at molecules of 
vastly different chemical compositions (amino acid, acid, flavenoid, glucoside). Here we 
were able to see the changes within a single genotype (red strawberry) as a function of 

25 developmental stage (green - white - turning - red) and compare it to a different genotype 
(white mutant). Only the non-targeted metabolic profiling technology described herein has 
this broad of a spectrum. Furthermore, as indicated in Table IV, these changes in the 
metabolome are directly correlated with changes in gene expression. 

30 II. The ability to compare different genotypes (Figures 13-15, Table V). 

In this example three different Arabidopsis thaliana mutants (TU1, TU3, TU5) that are 
known to have changes in the content and concentration of glucosinolates were compared 
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to a wild-type (WT). In this instance the non-targeted metabolic profiling technology 
described herein was able to confirm previous results as well as identify glucosinolate 
changes that had never before been observed. 

5 IIL The ability to detect and identify unknown metabolites involved in key pathways 

(Figures 16 and 17. Table IX). 

In this example the flowers of a control (red) tobacco was compared to a white mutant. It 
was expected that the glucoside (Figure 16) was the metabolite responsible for color. 
However, when analyzed by the non-targeted metabolic profiling method, the expected 
10 metabolite was not observed. An unknown metabolite (Figure 17) was detected and 
identified (Table IX) to be the metabolite responsible for tobacco flower color. 

IV, The ability to compare the effects of different environmental conditions on an 
organism (Table VP 

15 In this example the exuate from a carrot root grown under normal growing conditions 
(sufficient phosphate) was compared to the exuate from a carrot root grown under 
abnormal growing conditions (insufficient phosphate). Using non-targeted metabolic 
profiling we were able to identify key plant hormones that are excreted to promote 
symbiotic fungal growth under conditions of low phosphate. 

20 

V. The ability to group and classify metabolites based upon accurate MS/MS data (Table 
VII and Table VHP 

In this example accurate MS/MS fragmentation data was collected on the metabolites that 
were observed to be increased in the low phosphate conditions described above. Classes of 
25 molecules that have a similar substructure can be grouped together (in this case all 
metabolites with the C10H9N6O2 fragment). This capability greatly enhances the ability 
to search and characterize different complex mixtures 



WO 01/57518 



PCT/CA01/00111 



16 

VI. The ability to comprehen s ively monitor the metabolites of an organism (Table y 
Figure 18) 

In our study of the developmental stages of strawberry, we characterized the number of 
metabolites that we were observed as well as the number of metabolites that were observed 
to have changed in concentration between the different developmental stages. It is the 
comprehensive nature of this method that allows one to monitor and evaluate virtually all 
ongoing metabolic processes independently or in relation to one another. No other 
technology has this capability. 



Table I Example of Known Metabolite Database 



Pnmmnn 

Name 


Process 


Abbrev. 


c 


H 


N 


o 


P s 


Monoisotopic Masses 
M M+H M-H 


"gTyoxyiate 






2 


-r 








74.0004 


75.0676 


75.9052 


Glycine 




Gly,G 


2 


5 


1 


2 




75.0320 


76.0392 


74.0248 


pyruvic acid 




PA 


3 


4 




3 




88.0160 


89.0233 


87.0088 


L-Alanine 




Ala,A 


3 


7 


I 


2 




89.0477 


90.0549 


88.0404 


Lactic Acid 






3 


6 




3 




90.0317 


91.0389 


89.0245 


Cytosine 






3 


5 


3 


1 




99.0432 


100.0505 


98.0360 


Acetoacetic acid 






4 


6 




3 




102.0317 


103.0389 


101.0245 


gamma aminobutyrate 




GABA 


4 


9 


i 


2 




103.0633 


104.0705 


102.0561 


L-serine 






3 


7 


1 


3 




105.0426 


106.0498 


104.0354 


Histamine 






5 


9 


3 






111.0796 


112.0869 


110.0724 


Uracil 






4 


4 


2 


2 




112.0273 


113.0345 


111.0200 


3-cyanoaIanine 






4 


6 


2 


2 




114.0429 


115.0501 


113.0357 


L-Proline 




Pro.P 


5 


9 


1 


2 




115.0633 


116.0705 


U 4.0561 


L-Valine 




Val,V 


5 


11 


1 


2 




117.0790 


118.0862 


116.0717 


succinate 






4 


6 




4 




118.0266 


119.0338 


117.0194 


L-Homoserine 






4 


9 


1 


3 




119.0582 


120.0655 


118.0510 


L-Threonine 




Thr,T 


4 


9 


1 


3 




119.0582 


120.0655 


118.0510 


phosphoenolpyruvic acid 




PEP 


3 


6 




3 


1 


121.0054 


122.0127 


119.9982 


L-cysteine 




Cys,C 


3 


7 


I 


2 


I 


121.0197 


122.0270 


120.0125 


Nicotinic Acid 






6 


5 


1 


2 




123.0320 


124.0392 


122.0248 


Thymine 






5 


6 


2 


2 




126.0429 


127.0501 


125.0357 


L-Isoleucine 




Ite,I 


6 


13 


I 


2 




131.0946 


132.1018 


130.0874 


L-Leucine 




Leu,L 


6 


13 


1 


2 




131.0946 


132.1018 


130.0874 


oxaloacetic acid 




OAA 


4 


4 




5 




132.0059 


133.0131 


130.9986 


L-aspargine 




Asn,N 


4 


8 


2 


3 




132.0535 


133.0607 


131.0462 


L-Omi thine 






5 


12 


2 


2 




132.0899 


133.0971 


131.0826 


L-Aspartate 




Asp,D 


4 


7 


I 


4 




133.0375 


134.0447 


132.0303 


Ureidoglycine 






3 


7 


3 


3 




133.0487 


134.0559 


132.0415 


L-malic acid 






4 


6 




5 




134.0215 


135.0287 


133.0143 


Ureidoglycolate 






3 


6 


2 


4 




134.0327 


135.0400 


133.0255 


L-Homocysteine 






4 


9 


1 


2 


1 


135.0354 


136.0426 


134.0282 


Adenine (Vitamin B4) 






5 


5 


5 






135.0545 


136.0617 


134.0473 


Adenine 






5 


5 


5 






135.0545 


136.0617 


134.0473 


3-MethyleneoxindoIe 


Auxins 




9 


7 


1 


i 




145.0528 


146.0600 


144.0455 


Indolealdehyde 


Auxins 




9 


7 


1 


1 




145.0528 


146.0600 


144.0455 


Indolenine epoxide 


Auxins 




9 


7 


1 


1 




145.0528 


146.0600 


144.0455 


alpha-Ketoglutarate 






5 


6 




5 




146.0215 


147.0287 


145.0143 


L-Glutamine 




Gln,Q 


5 


10 


2 


3 




146.0691 


147.0763 


145.0619 


L-Lysine 




Lys,L 


6 


14 


2 


2 




146.1055 


147.1127 


145.0983 


L-GIutamate 




Glu,E 


5 


9 


I 


4 




147.0531 


148.0604 


146.0459 


L-Methionine 




Met,M 


5 


1 1 


1 


2 


I 


149.0510 


150.0583 


148.0438 


D-ribose 






5 


10 




5 




150.0528 


151.0600 


149.0456 


Guanine 






5 


5 


5 


1 




151.0494 


152.0566 


150.0422 


Indole~3-acetotitrile 


Auxins 


IAN 


10 


7 


2 






155.0609 


156.0681 


154.0537 
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Comments: Any molecule of known chemical composition can be added to the database at 
any time. The database is comprised of accurate monoisotopic masses. All molecules that 
have a unique empirical formula will have a unique accurate mass. This mass is a constant 
and is independent of the methodologies discussed herein making it possible to analyze all 
5 of the components in a complex sample in a non-targeted fashion. 

Figure 2 shows two raw mass spectrums. The top one is from the extract of a green stage 
strawberry and the lower one is from the extract of a red stage strawberry. Over 500 
unique chemical entities were observed over the mass range displayed above (100-350 
10 amu; which is only a subset of the entire mass range analyzed (100-5000)). Figures 3, 4, 
and 5 show smaller and smaller mass ranges to illustrate the separation of the metabolites. 

Figure 5 shows the resolution of the mass spectrum above 165,000. This extremely high 
resolution is necessary in order to separate all of the metabolites and thus be able to 
1 5 compare the two samples and determine the changes, if any. 
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Table II Illustration of processed data (file ID, mass, intensity, empirical formula, relative 
eiTor) 



RlelD 


IVbss 


Int 


C 


H 


N 


OPS 


Err 


ES! POSjDri 4 ns2 50 50 


99.044061 


2.05B-06 


5 


6 


0 


2 


0 


0 


0.05 


ESI PCSjDri 3 ts 50 50 


99.044082 


1.33E+06 


5 


6 


0 


2 


0 


0 


0.26 


ESI POS jDri 3 ts 50 50 


102.054929 


2.56E+06 


4 


7 


1 


2 


0 


0 


0.25 


ESI POS_pri 1_gs 50 50 


102.054956 


3.06B06 


4 


7 


1 


2 


0 


0 


0.01 


ESI POSjDri 2 ws 50 50 


102.054962 


1.36EK36 


4 


7 


1 


2 


0 


0 


0.07 


ESI POSjDri 4 ns2 50 50 


104.070595 


1.93E+06 


4 


9 


1 


2 


0 


0 


0.10 


ESI POSjDri 4 rs1 50 50 


104.070624 


1.75B06 


4 


9 


1 


2 


0 


0 


0.18 


ESJ POS JDri 5 jgs acn 


104.106977 


2.73B06 


5 


13 


1 


1 


0 


0 


0.13 


ESLPOSjDri 2 ws 50 50 


104.106979 


2.73E+06 


5 


13 


1 


1 


0 


0 


0.11 


ESI POS jDri 6 ws acn 


104.106981 


1.84B06 


5 


13 


1 


1 


0 


0 


0.09 


ESI POSjDri 1_gs 50 50 


104.107 


3.88EK)6 


5 


13 


1 


1 


0 


0 


0.09 


ESI POSjDri 3 ts 50 50 


106.049869 


1.21 BOB 


3 


7 


1 


3 


0 


0 


0.01 


ESI POSjDri 1 jgs 50 50 


106.04987 


1.36B08 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 2 ws 50 50 


106.04987 


1.63E+08 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 4 rs1 50 50 


106.04987 


1.08E-KB 


3 


7 


1 


3 


0 


0 


0.00 


ESLPCS JDri 4 rs2 50 50 


106.04987 


1.53B08 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 5_^s acn 


106.04987 


2.59EK)8 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 6 ws acn 


106.04987 


2.45E408 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 7 ts acn 


106.04987 


2.62B08 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 8 ns1 acn 


106.04987 


2.48E-H38 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 8 rs2 acn 


106.04987 


2.33B08 


3 


7 


1 


3 


0 


0 


0.00 


ESI POSjDri 6 ws acn 


107.070237 


1.34B06 


4 


10 


0 


3 


0 


0 


0.31 


ESI POSjDri 8 rs1 acn 


107.070322 


1.28B06 


4 


10 


0 


3 


0 


0 


0.48 


ESI POS jDri 7 ts acn 


108.080743 


2.79E406 


7 


9 


1 


0 


0 


0 


0.30 


ESI POSjDri 4 re2 50 50 


109.028414 


1.65B06 


6 


4 


0 


2 


0 


0 


0.07 


ESI POSjDri 4 rs2 50 50 


111.044016 


1.41E+06 


6 


6 


0 


2 


0 


0 


0.36 


ESI POSjDri 8 rs2 acn 


114.091316 


2.74B06 


6 


11 


1 


1 


0 


0 


0.21 


ESI POSjDri 1 jgs 50 50 


114.091319 


3.02E-+06 


6 


11 


1 


1 


0 


0 


0.19 


ESLPOSjDri 4 is1 50 50 


114.091336 


1.76E+06 


6 


11 


1 


1 


0 


0 


0.04 


ESI POSjDri 5jgs acn 


114.091337 


3.87E+06 


6 


11 


1 


1 


0 


0 


0.03 


ESI POSjDri 2 ws 50 50 


114.091342 


2.70E-K36 


6 


11 


1 


1 


0 


0 


0.01 


ESI POSjDri 7 ts acn 


114.091346 


3.26E+06 


6 


11 


1 


1 


0 


0 


0.05 


ESI POSjDri 6 ws acn 


114.091358 


3.18B06 


6 


11 


1 


1 


0 


0 


0.15 


ESI POSjDri 8 rs1 acn 


114.091375 


2.74E406 


6 


11 


1 


1 


0 


0 


0.30 


ESI POS JDri 4 rs2 50 50 


114.091377 


2.53E+06 


6 


11 


1 


1 


0 


0 


0.32 


ESI POSjDri 3 ts 50 50 


114.091404 


2.21E-HD6 


6 


11 


1 


1 


0 


0 


0.56 


ESI POSjDri 4 rs2 50 50 


115.038958 


3.43E406 


5 


6 


0 


3 


0 


0 


0.11 


ESi_POSjDri_5jgs_acn 


115.038978 


2.03E4O6 


5 


6 


0 


3 


0 


0 


0.07 


ESI POSjDri 2 ws 50 50 


115.038984 


1.84E+06 


5 


6 


0 


3 


0 


0 


0.12 


ESI POSjDri 8 rs1 acn 


115.038999 


1.57B06 


5 


6 


0 


3 


0 0 


0.25 


ESI POSjDri 4 rs1 50 50 


115.039032 


1.86E406 


5 


6 


0 


3 


0 


0 


0,53 


ESI POSjDri 3 ts 50 50 


115.03905 


1.67B06 


5 


6 


0 


3 


0 


0 


0.69 


ESI POS jdiI 2 ws 50 50 


116.034226 


1.76E406 


4 


5 


1 


3 


0 


0 


0.06 


ESI POSjDri l_gs 50 50 


116.034233 


2.43B06 


4 


5 


1 


3 


0 0 


0.12 


ESI POSjDri 3 ts 50 50 


116.03425 


2.07B06 


4 


5 


1 


3 


0 


0 


0.26 


ESI POSjDri 1_gs 50 50 


116.070538 


2.60E+06 


5 


9 


1 


2 


0 0 


0.58 


ESI POSjDri 3 ts 50 50 


116.070601 


1.46EK36 


5 


9 


1 


2 


0 0 


0.03 


ESI POSjDri 2 ws 50 50 


116.070643 


1.46B06 


5 


9 


1 


2 


0 


0 


0.33 


ESI POSjDri 4 rs1 50 50 


118.086184 


1 .56E"+06 


5 


11 


1 


2 


0 


0 


0.60 


ESI POSjDri 1_^s 50 50 


118.086217 


4.10BO6 


5 


11 


1 


2 


0 


0 


0.32 


ESI POSjDri 4 rs2 50 50 


118.086231 


152E+06 


5- 


11 


1 


2 


0 0 


0.20 


ESI POSjDri 2 ws 50 50 


118.086234 


123EK36 


5 


11 


1 


2 


0 0 


0.18 


ESI POSjDri 3 ts 50 50 


118.086246 


2.74EK36 


5 


11 


1 


2 


0 


0 


0.08 


ESI POS pri 5 as acn 


118.086249 


2.53B06 


5 


11 


1 


2 


0 0 


0.05 



C H N O P S Err 



5 Comments: The mass spectrum is processed such that the 13C isotopes are first eliminated 
(this is only possible in FTMS analysis due to the high resolution and mass accuracy). 
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Then the remaining peaks are automatically analyzed using the mass analysis program that 
is included with the instrument using specific constraints chosen by the researcher (in the 
above example only those peaks that have the appropriate combination of carbon (C), 
hydrogen (H), oxygen (O), nitrogen (N), sulfur (S), or phosphorus (P) are returned). The 
final dataset now only contains monoisotopic, singly charged metabolites that have an 
accuracy of measurement of less than 1 ppm (err). 

Table m Illustration of the database generated from the processed data; 



Empirical 

C H N 

21 20 0 
25 34 6 
24 22 0 

22 32 6 

46 35 11 
19 17 11 

11 16 4 
9 18 8 
30 67 19 

47 71 7 

22 40 14 

23 24 8 
9 16 8 
2D 28 
22 29 
33 54 

14 29 

15 20 
21 12 
40 34 8 
27 50 2 
21 44 
30 42 

12 24 



Formula 

OPS Mass 

10 0 0 nf 

19 0 0 nf 

13 0 0 nf 

10 0 nf 

10 1 nf 

3 0 0 nf 

9 0 1 nf 
5 0 3 nf 

4 0 0 nf 
3 0 0 nf 

5 0 2 nf 
5 0 1 nf 

10 3 nf 

11 0 1 nf 

10 3 nf 
9 0 0 nf 
13 0 0 nf 

11 0 0 nf 
2 0 1 nf 
0 0 3 nf 
5 0 2 nf 
21 0 2 nf 
17 0 1 707222203 
11 0 1 nf 



Int 
1.30&06 
1.30&06 
130&06 
1.30&06 
1.30B06 
1.30306 
1.30&O6 
130B06 
1.30&06 
1.30&OS 
1.30&06 
1.30&O6 
1.30B06 
130B06 
1.30B06 
1.30B06 
1.30B06 
1.30&O6 
1.30&06 
1.30&06 
1.30&06 
1.30&O6 
5.04&06 
1.30B06 



White Stage 

M3SS Int TOGS 

nf 1.30&06 100 

7231955 5.21 &07 4006 

nf 1.30B06 100 

. nf 1.30BO5 100 

790.2821 2.62EKJ7 2015 

448.1592 aS3B07 2715 

381.0710 1.68&07 1292 

nf 1.30&06 100 

nf 1.30B06 100 

7825697 367007 2323 

6452825 227B07 1746 

5251667 4.15B06 319 

nf 1.30B06 100 

5331550 &75B06 442 

448.1546 1.34B07 1031 

679.4031 1.52B07 1169 

448.1774 1.17&07 900 

nf 1.30&06 100 

nf 130&06 100 

nf 1.30&06 100 

547.3240 1.21B07 931 

nf 1.30B06 100 

707.2220 1.&4&07 385 

nf 1.30E+O5 100 



Turning Stage 

Mass Int TOGS TSMS Mass 

4331130 168B07 1292 1292 4331128 

7231952 1.12B03 8615 215 7211953 

5iai132 ai6&06 243 243 519.1133 

nf 1.30&O6 100 100 397.2714 

7902819 5.71&07 4392 218 7902822 

448.1591 4.88&07 3754 138 448.1592 

381.0710 219&07 1685 130 381.0709 

nf 1.30B06 100 100 4150638 

7585697 a27BC7 2515 2515 7585693 

7825394 3.19&07 2451 87 7825697 

645.2623 271B07 2085 119 645.2825 

525.1663 1.54&07 1185 371 5251664 

349.0683 1.42&03 109 109 349.0685 

5331551 1.54&07 1185 2S8 5331553 

448.1545 1.73&07 1331 129 44ai546 

679.4025 1.58&07 1215 104 679.4028 

448.1774 1.53&07 1177 131 44d1774 

nf 1.30BO3 100 100 377.1078 

nf 1.30&06 100 100 329.0634 

nf 1.30B06 100 100 7232143 

547.3239 1.22&07 938 101 547.3240 

nf 1.30B06 100 100 7251951 

707.2216 5.34&07 1060 275 707.2218 

nf 1.30&06 100 100 4331235 



Red Stage 

Int RS1/GB 

298&06 22923 

1.41B08 10846 

1.21&08 9303 

6.32&07 4862 

4.54007 3492 

4.02&07 3092 

275&07 2115 

269&07 2069 

244&07 1877 

212B07 1631 

212&07 1631 

152B07 1169 

1.5OB07 1154 

138&07 1062 

1.32B07 1015 

1.31&07 1008 

1^3B07 985 
1^4&07 
1.17B07 
1.13&07 
1.06&07 
1.05BO7 
399B<}7 
9.92&06 



954 
900 



22923 

271 
9308 
4862 

173 

114 

164 
2069 
1877 

58 
.93 
366 
1154 
240 

99 

86 
109 
954 
900 



815 
808 
792 
763 



RSTS 

1774 

126 
3829 
4862 

80 

82 

126 
2369 

75 

66 

78 

99 
1056 

90 

76 

83 

84 
95* 
900 
869 

87 



808 
206 
763 



75 
763 



Comments: In Table III, the data was sorted according to the relative expression of 
metabolites in the red stage vs the green stage of strawberry. The data can be organized by 
any field. What is observed is that the metabolite C10H20O10 has a concentration that is 
at least 22923% of that observed in the green stage (this metabolite is not observed in the 
green stage so the value is a % of the background noise). This metabolite can be identified 
by its empirical formula as pelargonidin-3-glucoside 5 the primary pigment observed in 
strawberries that give them their red color. This process is automated. 
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Table IV Comparison of Metabolite and Gene Expression Data in Strawberry Color 
Formation (Red Stage vs. Green Stage) 





Relative 


Relative 


Metabolic Pathway 


Metabolite 


Gene 




Expression 


Expression 


4-Coumarate~COA to Nargingenin Chalcone 


4.3 


3.3 


Naringenin Chalcone to Naringenin 


4.3 


4.3 


Leucopelargonidin to Pelargonidin 


20* 


6.7 


Pelargonidin to Pelargonidin-3-Glucoside 


42* 


8.3 



* Reflects greater dynamic range of metabolic expression analysis 
5 



Comments: Figures 7 through 12 and Table IV show the power of non-targeted metabolic 
profiling in studying changes that occur during development. Non-Targeted metabolic 
profiling allows the researcher to monitor entire metabolic pathways simultaneously. 
There is no other methodology that allows for the simultaneous analysis of such a diverse 
10 range of analytes. All of the analytes illustrated above were extracted from the non- 
targeted data collected using the methodology and concepts presented in this application. 
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and identification of unknown metabolites). Relative changes in 3-Methylsulphinylheptyl 
Glucosinolate illustrated. 

Table V Comparison of Glucosinolates in different Arabidopsis thaliana mutants 

5 

Arabidopsis Glucosinolate Mutants 

Glucosinolates 



R= 


WT 


TU1 


TU3 


TU5 


TU7 


3-Methylthiobutyl 


1.00 


<0.06(nf) 


2.69 


0.14 


0.36 


3-Methylthiopentyl 


1.00 


<0.56(nf) 


2.12 


<0.56(nf) 


0.71 


3-Methylthiohepiyl 


1.00 


1.00 


<0.21{nf) 


0.32 


<0.21(nf) 


3-Methylthiooctyl 


1.00 


2.93 


<0.09(nf) 


0.92 


0.15 


3-Methylsulphlnylpropyl 


1.00 


27.62 


1.37 


21.56 


0.37 


3-Methylsulphinylbutyl 


1.00 


0.10 


2.50 


0.63 


0.53 


3-Methylsulphinylpentyl 


1.00 


1.56 


3.11 


0.79 


1.11 


3-Methylsulphinylheptyl 


1.00 


1.38 


<0.37(nf) 


0.64 


<0.37(nf) 


3-Methylsulphinyloctyl 


1.00 


6.16 


<0.11(nf) 


4.25 


0.37 


3-lndolylmethyl 


1.00 


4.44 


0.90 


1.85 


0.71 


M eth oxy-3-i n d olylm ethyl 


1.00 


1.41 


0.67 


0.59 


0.46 


C3H70S 


1.00 (nf) 


>6.88 


nf 


nf 


nf 


C5H1108S 


1.00 


2.68 


0.73 


0.85 


0.60 


C7H10OS3 


1.00 (nf) 


>5.73 


nf 


>3.01 


nf 


C8H120S3 


1.00 


<0,37(nf) 


1.95 


<0.37(nf) 


0.45 


C13H26N03S 


1.00 


2.55 


1.05 


1.18 


0.44 


C21H2303 


1.00 


2.74 


1.21 


0.47 


0.52 



19 Glucosinolate Molecules Observed (17 reported) 



Comments: In Table V, the applicability of the technology for comparing genetic mutants 
to their wild-type counterparts is illustrated. The non-targeted metabolic profiles of four 
10 mutants (TU1, TU3, TU5, and TU7) were compared to their wild-type counterpart. Here 
we show that not only can we identify and monitor the glucosinolates that had been 
previously analyzed using targeted analysis, but were able to identify previously 
unidentified glucosinolates. As is the case in all of our analyses, all of the other 
metabolites are also available for evaluation. 
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Table VI Illustration of database generated by directly comparing two samples (carrot 
root exuate in the presence and absence of phosphate) Summary of Metabolites that were 
Observed to be Increased in the -P Fraction 



-P/+P 
Ratio (Corr.) 

1 1 / AiJ^U 

1053.350 

981.550 

658.650 

186.090 

73.375 

52.845 


Mode 

ESI + 
ES1+ 
ESI+ 
ESJ+ 
ESI+ 




Mass — 

245,0783 

.WOO 

651 '■'412 


Minus P 
. Abs Int. 

2.35E+09 
2,1 1E+09 

•\ Qccj.no 
1 .o2fcz+09 
c». i 2t+Uc 
7.47c+Oa 
1 .Ubfc+Uc 


■ Corr. Int. 
1.17E+09 
1.05E+09 
9.82E+08 
6. 59 E+08 
1 .86E+08 
7.34 E+07 
5.28E+07 




Plu 
Mass 


5 P 

Abs Int. 
1.00E+06 
1.00E+06 
1.00 E+06 
1.00E+06 
1.00E+06 
1.00E+06 




F 

C 
10 
22 
10 
12 
12 
31 


3 roj: 
H 
9 

23 
9 
15 
14 
35 


ose 
N 

6 
6 

6 


d E 

O 
2 
6 
3 
4 
4 
10 


m F 
P 


iri 
S 


sal 
C 


Forr 
Na 


nu 
K 

1 


a 

e' 




Observed 
As 
+H 
+H 
+H 
+H 
+K 
+H 


Theoretical 
Mass 

245.07815 
467.1673589 
177.0546206 
223.0964854 
261.0523672 
651.2409178 


Error 
(ppm) 

0.73 
-0.45 
-0.17 
-0.16 
0.05 
0.48 


47.308 


ESI+ 








4.73E+07 






1.00E+06 
1.00E+06 




15 
31 


22 
35 


1 
6 


7 

8 
















+H 
+H 


328.1390785 
619.2510885 


-0.24 
-0.39 


35.421 
34.279 


ESI+ 
ESI+ 




OJa.ZO 1 O 


7 no crxri'v 
o.obb+U/ 


3.54E+07 
3.4ot+07 






1.00E+06 




28 


43 


6 


6 
















+H 


559.3238596 


0.13 


31.780 


ESI+ 




1(Y7 CI4RQ 

ou i .u^oy 


b.obt+U/ 


3.1 8E+07 






1.00E+06 
1.00E+06 




27 
12 


35 
19 


6 


6 
3 




3 












+H 
+H 


539.2612593 
307.049083 


0.00 
-0.60 


28.136 


ES1+ 






O.OOt+U f 


2.81 E+07 






1.00E+06 




26 


31 


6 


6 
















+H 


523.2299592 


-0.09 


25.510 
24.248 


ESI+ 
ESI- 




569.1988 
279.1236 


5.10E+07 
2.42E+07 


2.55E+07 
2.42E+07 






1.00E+06 




26 


29 


6 


9 


















569.199053 


-0.44 


22.393 


ESI+ 




oJo,oi?i?4 


4.48E+0? 


2,24E+07 






1.00E+06 
LOOE+OB 




15 

34 


19 
47 


6 


5 

6 
















-H 
+H 


279.1237973 
635.3551597 


-0.60 
0.38 


21,312 


ESI+ 






4.4bh+Uf 


2.13E+07 






1.00E+06 




28 


43 


6 


5 
















+H 


543.3289449 


-0.21 


20.003 


i- 






2.00E+07 


2.00E+07 






1.00E+06 




20 


25 




7 
















+H 


377.1594796 


-0.18 


19.937 


- fr^it- 




/jyi .0/14 


3.99E+07 


1.99E+07 






1.00E+06 




11 


15 




9 
















+H 


291.0710585 


1.04 


15 314 


ist+~ 




279.1239 


1.53E+07 


1.53E+07 






1.00E+06 




15 


19 




5 
















-H 


279.1237973 


0.26 








487.2663 


2.66E+07 


1.33E+07 






1.00E+06 




24 


35 


6 


5 
















+H 


487.2663447 


-0.07 


13.273 
13.091 


ESI- 
APCJ- 




335.2227 
335.2230 


6.63E+07 
1.60E+08 


6.63E+07 
1.60E+08 




335.2227 
335.2231 


5.00E+06 




20 


31 




4 
















-H 


cs .3o.242/a31 


-0.40 


12.968 


ESI+ 




242.0700 


2.59E+07 


1.30E+07 






1.22E+07 
1.00E+06 




20 
15 


31 
20 


10 


4 
9 
















-H 
+2H 


335.2227831 
242.0701876 


0.66 
-0.86 


11.693 


ESI+ 




473.2507 


2.34E+07 


1.17E+D7 






1.00E+06 




23 


33 


6 


5 
















+H 


473.2506946 


0.10 


11.236 


ESI- 




167.6111 


1.12E+07 


1.12E+07 






1.00E+06 




18 


29 


3 


3 
















-2H 


167.6109945 


0.33 


9.001 


ESI+ 




149.0233 


4.81 E+08 


2.40E+08 




149.0233 


2.67E+07 




8 


5 




3 
















+H 


149.0233204 


0.00 


8.226 


ESJ+ 




459.2352 


1.65E+07 


8.23E+06 






1.00E+06 




22 


31 


6 


5 
















+H 


459.2350446 


0,36 


8.011 


APCI+ 




319.2267 


3.59E+07 


3.59E+07 




319.2267 


4.48E+06 




20 


31 




3 
















+H 


319.2267713 


-0.22 


7.742 


ESI- 




249.1494 


2.14E+07 


2.14E+07 




249.1494 


2.77E+06 




15 


21 




3 
















-H 


249.1496181 


-0.71 


7.279 


ESI- 




333.2071 


1.43E+07 


1.43E+07 




333.2071 


1.96E+06 




20 


29 




4 
















-H 


333.207133 


-0.13 


7.163 


ESI+ 




483.1415 


1.43E+07 


7.16E+06 






1.00E+06 




24 


28 




8 










1 






+K 


483.1415762 


-0.12 


6.902 


ES1- 




347.1864 


1.15E+07 


1.15E+07 




347.1864 


1.6SE+06 




20 


27 




5 
















-H 


347.1863976 


-0.11 


6.655 


APCI- 




263.1290 


6.66E+06 


6.66E+06 






1.00E+06 




15 


19 




4 
















-H 


263.1288827 


0.26 


6.270 


APCI- 




347.1867 


1.87E+07 


1.87E+07 




347.1867 


2.98E+06 




20 


27 




5 
















-H 


347.1863976 


0.83 


6.019 


ESI+ 




345.1258 


1.20E+07 


6.02E+06 






1.00E+06 




14 


22 


6 






1 






1 






+K 


345.1258237 


-0.01 


5.306 


ESi- 




263.1287 


5.31 E+06 


5.31E+06 






1.00E+06 




15 


19 




4 
















-H 


263.1288827 


-0.69 


5.300 


ESI+ 




229.1047 


1.06E+07 


5.30E+06 






1.00E+06 




15 


17 








1 












+H 


229.1045477 


0.75 


4.971 


ES1- 




191.1076 


4.97E+06 


4.97E+06 






1.00E+06 




12 


15 




2 
















-H 


191.1077533 


-0.60 


4.603 


ESI- 




213.1494 


2.32E+07 


2.32E+07 




213.1494 


5.03E+06 




12 


21 




3 
















-H 


213.1496181 


-1.02 


4.600 
4.524 


ESI- 
APCI- 




277.1443 
333.2074 


4.60E+06 
2.20E+07 


4.60E+06 
2.20E+07 






1.00E+06 




16 


21 




4 
















-H 


277.1445327 


-0.84 


4.163 


ESI- 




199.1341 


1.18E+07 


1.18E+07 




333.2075 
199.1341 


4.87E+06 
2.83E+06 




20 
11 


29 
19 




4 
3 
















-H 
-H 


333.207133 
199.1339681 


0.97 
0.61 


3.392 


ES1- 




227.1650 


3.17E+07 


3.17E+07 




227.1650 


9.33E+06 




13 


23 




3 
















-H 


227.1652682 


-1.05 


3.131 


ESI+ 




312.1441 


6.26E+06 


3.13E+06 






1.00E+06 




15 


22 


1 


6 
















+H 


312.1441639 


-0.08 


3.111 


APCI- 




249.1497 


1.54E+07 


1.54E+07 




249.1497 


4.95E+06 




15 


21 




3 
















-H 


249.1496181 


0.19 


2.566 


APCU 




329.2336 


2.29E+07 


2.29E+07 




329.2335 


6.92E+06 




18 


33 




5 
















-H 


329.2333477 


0.58 


2.438 


ESI- 




415.1794 


2.44E+06 


2.44E+06 






1.00E+06 




20 


31 




7 




1 












-H 


415.1795976 


-0.60 


2.017 | 


ESI+ 




285.0951 


4.03E+06 


2.02E+06 






1.00E+06 




10 


17 


6 




-nr 












+H 


285.0950624 


-0.01 



Comments: Table VI illustrates how our technology can be used to compare the metabolic 
profile of an organism under different environmental conditions. Here we were able to 
detect and identify key molecules involved in controlling the plant's response to phosphate 
conditions. This capability allows researchers to determine what effects changes in 
environmental conditions will have on the biological functions of an organism. 
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Table VII MS/MS Data for Selected Metabolites Observed to be Increased in the -P 
Fraction 



Jrarem 


T7t*c» mm P*ni" 
J? 1 dgjlllClJLL 


T o« Of- 


C 3 iH35N 6 Oio[H + l 


Ci 9 H23N 6 0 5 [H + ] 


C12H12O5 


651 


CiqH9lNfiOdrH + l 




+ESI 


*CinHoN/;09rH + l 






CoH^FH" 1 *! 




C 3 iH35N 6 0 8 [H + ] 


C 19 H 2 3N 6 0 5 [H + ] 


C12H1203 


619 


Ci9H 2 iN60 4 rH + l 


C12H1404 




*CmHQN^09rH + l 






CoH7fH + l 




C26H 2 9N 6 09[H + ] 


C 19 H23N 6 05[H + ] 


C 7 H 6 0 4 




CiaH^NsOJH 1 ! 

^19- LJ -21 J - N o v - / 4L J - A J 






♦CinHoNrfO?^! 






C Q H7rH + l 




C28H 4 3N 6 06[H + ] 


C 19 H 2 3N 6 0 5 [H + ] 


C 9 H 20 O 


559 


C,QH 2 lN604rH + l 


C 9 H 22 0 2 


+ESI 


*CioHqN602[H + 1 


Ci f?H?n04 




CgH7rH + l 




^28 ri 43 iN 6^5 L 1 ^ J 


CioHo^N^OsrH 4 "! 




543 


C 19 H 21 N 6 0 4 [H + ] 


C 9 H 22 0 


+ESI 


*CinH 9 N60 2 rH + l 


Ci RrlonOs 




CoH 7 rH + l 




C 27 H35N 6 0 6 [H + ] 


C 19 H 23 N 6 0 5 [H + ] 


C 8 H 12 0 


539 


C 19 H 21 N 6 0 4 [H + ] 


CgHi 4 0 2 


+ESI 


*C 15 H 21 N 6 02[H + ] 


*C 12 H, 4 0 4 




Ci 0 H 9 N 6 O 2 [H + ] 


Ci 7 H260 4 




C 9 H 7 [H + ] 




C 26 H 31 N 6 0 6 [H + ] 


C 19 H23N 6 0 5 [H + ] 


C 7 H 9 0 


523 


C 19 H 2 iN 6 0 4 [H + ] 


C 7 Hi 0 O 2 


+ESI 


*C 14 H 17 N 6 0 2 [H + ] 


*C 12 H 14 0 4 
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r~\ TT XT (~\ PT T^~H 
C22H23N6O6IH J 


*/~^ tj \t r\ rTJ + i 
5tt CioH9N6U2L-n- J 


*Ci2Hi 4 U 4 


4o/ 






+ESI 








*C^HoOarH + l 


CoH^O 


923 


CoH70tFH + 1 


CJHUO 


+ESI 


C 8 H 5 0 3 [H + ] 


C4H10O 




CJ4sOrH + l 




^1 0x19^3 L* 1 J 




v^2 A ■ t 4 


177 


C 6 H 5 0[H + ] 


C4H4O2 


+ESI 






*C 8 H 5 0 3 [H + ] 


C 7 H 5 0 2 [H + ] 


CO 


149 


C 6 H 5 0[H + ] 


c 2 o 2 


+ESI 
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Table VIII Determination of Metabolite Relations using MS/MS data 



Rl 


R3 


R2 


CioH 8 N 6 02 


None 


T T /~\ 
U12H14U4 




CaHr 


C ,n H 1 a \ J a 
^LZ X x 14^4 


CioH 8 N 6 0 2 


C5H12 


C12H14O4 


CioHgNeOi 


C 6 H 6 


C12H14O4 


C,oH 8 N 6 02 


C 4 H 6 0 3 


C12H14O4 


CioH 8 N 6 02 


C9H10O2 


C12H14O4 


C l0 H 8 N 6 O2 


C9H10O4 


C12H14O4 


CioH 8 N 6 02 


C 6 H 6 


C12H14O3 
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Table IX. Mass Analysis of unknown peak observed in Tobacco Flower Analysis 



Mass Analysis of Unknown Peak 



Calibration Constants: 
ML1: 108299134.679450 
ML2: -16.576817 
ML3: -2029.796744 

Calibration Results: 
Ref. Masses Exp. Masses Diff (ppm) 



0.0187 
0.0542 
0.0919 
0.0060 
0.1037 

Observed Mass of Unknown: 595.16572 

Empirical Formula Search Result: C 27 H 30 O 15 [+H]+ 
Mass: 595.16575 
Mass Error: 0.04 ppm 

Proposed Metabolite: C 15 H 10 O 6 - Rhamnoglucoside 
(present in flowers of grapefruit) 



124.039300 
161.092070 
303.166300 
609.280660 
962.430130 



124.039298 
161.092079 
303.166272 
609.280664 
962.430230 



Comments: Figures 16 and 17 and Table IX show how our technology provides 
meaningful information that would otherwise not be obtained. In this example the 
researcher thought that he knew the primary color component in tobacco flowers 
(C15H10O6-Glucoside) but our analysis showed that the primary color component in 
10 tobacco flowers is actually the rhamnoglucoside. This illustrates the power of being able 
to identify unknown components after analysis. No other technology is currently available 
to provide this type of analysis. 
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Table X Illustration of the number of metabolites monitored in strawberry extracts. 
Summary of Metabolites Observed from Different Extraction Methods and Ionization 
Conditions. 

Number of Unique Metabolites Observed 





50/50 


AON 


In Roth 


Total 


ESI + 


1143 


1054 


540 


1657 


ESI - 


966 


790 


211 


1545 


APCI + 


979 


1431 


615 


1795 


APCI - 


898 


1205 


370 


1733 


Total 


3986 


4480 


1736 


6730 



Table X and Figure 18 illustrate the comprehensive nature of our invention. Our 
technology allows for the comprehensive comparison of the metabolic profiles of 
10 organisms under varying environmental, genetic, and developmental conditions. 
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In this patent document, the word "comprising" is used in its non-limiting sense to mean 
that items following the word are included, but items not specifically mentioned are not 
excluded. A reference to an element by the indefinite article "a" does not exclude the 
5 possibility that more than one of the element is present, unless the context clearly requires 
that there be one and only one of the elements. 

It will be apparent to one skilled in the art that modifications may be made to the illustrated 
embodiment without departing from the spirit and scope of the invention as hereinafter 
1 0 defined in the Claims. 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS: 



1 . A Method for non-targeted complex sample analysis, comprising the steps of: providing 
5 a known molecule database (16) containing identifying data of known molecules; 

introducing a complex sample containing multiple unidentified molecules into a Fourier 
Transform Ion Cyclotron Mass Spectrometer (FTMS) (12) to obtain identifying and 
quantitation data regarding the molecules in the complex sample; comparing the collected 
data regarding the molecules in the complex sample with the identifying data of known 
10 molecules in order to arrive at the identification of the molecules in the sample; and the 
creation of a non-targeted metabolite database from all the identifying and quantitation 
data collected from the complex sample. 

2. The method as defined in Claim 1, the complex sample being a biological sample. 

15 

3. The method as defined in Claim 1, the complex sample being a combinatorial chemistry 
synthesis sample. 



4. The method as defined in Claim 1, the identifying data being the experimentally 
20 determined empirical formula of the parent molecule and whose theoretical mass agrees to 

within 1.0 ppm relative error of the experimentally measured mass. 

5. The method as defined in Claim 1, the identifying data being the accurate mass of the 
parent molecules experimentally determined with a relative error of determination less than 

25 1.0 ppm. 

6. The method as defined in Claim 1, the identifying data being the accurate mass of the 
fragments of the parent molecules experimentally determined with a relative error of 
determination less than 5.0 ppm. 

30 

7. The method as defined in Claim 1, the identifying data being the experimentally 
determined empirical formula of the fragment molecules of the parent molecules and 
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whose theoretical mass agrees to within 5.0 ppm relative error of the experimentally 
measured mass. 

8. The method as defined in Claim 1, the quantitation data being the relative and/or 
5 absolute intensity of the parent molecule. 

9. The method as defined in Claim 1, the quantitation data being the relative and/or 
absolute intensity of the fragment molecules. 

10 10. The method as defined in Claim 1, the non-targeted metabolite database being 
organized to permit searching for known metabolites by accurate mass (defined as 
measured mass with less than 1.0 ppm relative error). 

11. The method as defined in Claim 1, the non-targeted metabolite database being 
15 organized to permit searching for known metabolites by empirical formula. 

12. The method as defined in Claim 1 ? the non-targeted metabolite database being 
organized to permit identification of metabolites by the accurate mass of the parent 
molecule. 

20 

13. The method as defined in Claim 1, the non-targeted metabolite database being 
organized to permit identification of metabolites by the empirical formula of the parent 
molecule. 

25 14. The method as defined in Claim 1, the non-targeted metabolite database being 
organized to permit identification of metabolites by the empirical formulas of the 
fragments of the parent molecule. 

15. The method as defined in Claim 1, the non-targeted metabolite database being 
30 organized to permit identification of metabolites by the accurate masses of the fragments 
of the parent molecule. 
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16. The method as defined in Claim 1, the non-targeted metabolite database being 
organized to permit the comparison of two samples to each other such that the relative 
intensity, presence, and/or absence of each metabolite is determined. 

5 17. The method as defined in Claim 1, the non-targeted metabolite database being 
organized to permit the comparison of one or more "test" samples to a "control" sample 
such that the intensity, presence, and/or absence of the metabolites present in the "test" 
samples can be determined relative to the control sample and other test samples. 

10 18. The method as defined in Claim 1, 16, or 17, the non-targeted metabolite database 
being organized to permit for the sorting, presenting and reporting of the data in ascending 
or descending order of the relative intensities determined. 

19. The method as defined in Claim 1, 16, or 17, the non-targeted metabolite database 
15 being organized to permit for the sorting, presenting and reporting of the data according to 

the accurate mass of the fragments of the parent molecules. 

20. The method as defined in Claim 1, 16, or 17, the non-targeted metabolite database 
being organized to permit for the sorting, presenting and reporting of the data according to 

20 the empirical formulas of the fragments of the parent molecules. 

21. The method as defined in Claim 16, 17, 18, 19, 20, the correlation of the data contained 
within the non-target metabolite database from biological samples from a genetically 
modified "test" organism and its non genetically modified "control" organism with gene 

25 expression data from same said organisms for the purpose of determining the function of 
the genes affected by the genetic modification. 

22. The method as defined in Claim 16, 17, 18, 19, 20, the correlation of the data contained 
within the non-target metabolite database from biological samples from an organism 

30 exposed to a "test" environment and a "control" environment with gene expression data 
from same said organism under same said conditions for the purpose of determining the 
function of the genes affected by the test environment. 
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23. The method as defined in Claim 22, the test environment is deemed to be any internal 
or external force imparted on the organism that may have an impact on its function. 
Examples include but are not limited to: exposure to or withdrawl from drug, pesticide, 

5 nutrient, or other chemical entity, weather conditions such as drought, frost, heat, 
psychological conditions such as stress. 

24. The method as defined in Claim 16, 17, 18, 19, 20, the correlation of the data contained 
within the non-target metabolite database from biological samples from an organism at 

10 different stages of its development with gene expression data from same said organism at 
same said stages of its development for the purpose of determining the function of the 
genes affected by the changes in development of the organism. 

25. The method as defined in Claim 16, 17, 18, 19, 20, the correlation of the data contained 
15 within the non-target metabolite database from biological samples from a healthy organism 

and diseased organism with gene expression data from same said organisms for the 
purpose of determining the function of the genes affected by the disease state of the 
organism. 
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